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Summary 

This paper proposes a hierarchical method for estimating the location param- 
eters of a multivariate vector in the presence of missing data. At i th step of 
this procedure an estimate of the location parameters for non-missing com- 
ponents of the vector is based on combining the information in the subset of 
observations with the non-missing components with updated estimates of the 
location parameters from all subsets with even more missing components in an 
iterative fashion. If the variance-covariance matrix is known, then the result- 
ing estimator is unbiased with the smallest variance provided missing data are 
ignorable. It is also shown that the resulting estimator based on consistent es- 
timators of variance-covariance matrices obtains unbiasedness and the smallest 
variance asymptotically. This approach can also be extended to some cases of 
non-ignorable missing data. Applying the methodology to a data with random 
dropouts yields the well known Kaplan-Meier estimator. 

Some key words: Parameter estimation; Missing data; Hierarchical technique 
for missing data 



1. Introduction 

Censored and missing data are unavoidable parts of many rectangular data sets. 
For the purposes of handling these kind of data many different approaches have 
been developed in recent years. Little and Rubin (2002) considered a taxonomy 



of missing-data methods consisting of procedures based on completely recorded 
units, weighting procedures, imputation-based and model-based procedures. All 
these procedures can be classified into two general categories: imputational and 
non-imputational techniques. 

The first category contains a variety of single and multiple imputation meth- 
ods including mean substitution, last observation carried forward, and imputa- 
tional techniques for likelihood-based approaches. Multiple Imputation (MI) 
(Rubin, 1987) is now the accepted standard with several statistical packages 
supplying easy to use software for applying this method (see, for example, 
procedures MI and MIANALYZE in SAS, 2002). Monte Carlo Markov Chain 
(MCMC) provides a flexible tool for MI. Some illustrative MCMC examples 
are described by Schafer (1997). Expectation Maximization algorithm (Demp- 
ster, Laird, Rubin, 1977) for maximum likelihood estimators and approximate 
Bayesian Bootstrap (Rubin and Schenker, 1986) for stratified samples are in 
this category. In addition, several authors have investigated the small sample 
as well as large sample properties of estimators based on multiple imputation 
(Barnard and Rubin, 1999). 

The second category consists of non-imputational techniques with the com- 
plete case method and available case method being the most popular (Verbeke 
and Molenberghs, 2000). In addition considerable methodology has been con- 
structed for obtaining maximum likelihood estimators: parameter estimation on 
incomplete data in general linear models (Ibrahim, 1990); pattern set mixture 
models (Little, 1993), including the analysis based on pattern mixture models 
and selection models. The analysis based on pattern mixture models is the 
one in which inference for a function of the location parameters is obtained 
by combining in some weighted fashion estimates obtained from each pattern 
of missing components observed in the data (Molenberghs, Michiels, Kenward, 
Diggle, 1998). Pattern mixture models are the closest analogues to the tech- 
nique proposed in this paper, but the proposed method does not depend on 
assuming a parametric family. 

To develop a new distribution free non-imputation approach for estimation 
on missing data we reviewed some methods proposed and developed for in- 
volving auxiliary information in statistical function estimation. One important 
method due to Pugachev (1973) is the method of correlated processes which 
uses correlation effect between auxiliary information and empirical data for in- 
corporating auxiliary information in statistical estimation. This method was 
later developed and extended by Gal'chenko and Gurevich (1991) who incor- 
porated the estimators from previous experiments into the current estimator. 
The estimators obtained by these approaches provide smaller or asymptotically 
smaller variances than the variance of the current estimator. The further ex- 
tension which is the subject of this paper provides a methodological basis for 
statistical estimation for missing data. 

This new method is introduced in Section 2 and the asymptotic properties of 
this method are then derived in Appendix. Applications to the situation where 
missing data is due to right censoring is considered in Section 3 and shows that 
in this important special case the method produces the well known Kaplan- 
Meier estimator. The other applications to samples from a bivariate random 
variable with ignorable and a special case of non-ignorable missing data are 
presented in Section 4. In this section considered the vector of means estimation 
at general pattern of ignorable missing data and change score estimation at 
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random dropout. Conclusions are stated in Section 5. 



2. Methodology 

2.1. Notation 

Suppose Xi, ...,Xjv are independent and identically distributed random vectors 
with common probability distribution Px(x), where x € X c HP , N and 
Q are finite and strictly positive integers. But Xi,...,Xat are not observed 
directly. These data are subject to a missing data mechanism by corresponding 
vectors indicating nonresponse: Ri,...,Rat. Here R„ = (i?„i, R u q) and 
R nq & {0,1}, n = 1,...,7V, q = 1,...,Q. In the notation R nq = 1 indicates 
response and R nq = indicates non-response. What is really observed is a 
random vector Yi,...,Yjv, where Y„ = (Y nl , Y n q), Y nq = X nq if R nq = 
and Y nq is missing if R nq = 1, n = 1, N, q = 1, Q. 

Let 6 = (6i,...,9s) take values in 1Z S , where S = f x ip s (x.)dPx(x) with 
</9 s (x) a known function defined on 1Z® , 9 S G 1Z, s = 1, S. 

Several examples of </3 s (x) = ip s (xi, - .-.xq) follow. 

In this paper the location parameter estimation is emphasized. If cp a (x\ , .. ., xq) = 
xi, then 9 S is a mean of x\. If ip s {x\ 1 ■■■,xq) is an indicator function of some 
event defined by the variables x\,...,xq, then the parameter 9 S becomes the 
probability of this event. Hence, a Cumulative Distribution Function (CDF) 
can be estimated. The obtained CDF estimator can further be used to estimate 
percentiles, median, interquartile range, and many other parameters. 

This approach is not restricted only to location parameter estimation. If 
(p s (xi, ...,xq) = x\xq then 9 S is a mixed moment of x\ and xq. In general, we 
are not excluding from consideration the possibility of more intricate forms for 
<p g (xi, ...,^g), for example ip s (xi, ...,xq) = x Q log (xix 2 )- 

Though the location parameter estimation is the main objective of this pa- 
per, the methodology presented in this section accommodates all these cases. 

First, consider an ignorable mechanism of missing data generation. The idea 
of how to apply this approach to non-ignorable missing data is considered in 
subsection 2.4 with a special case in Section 4. 

2.2. Hierarchical structure 

Let Rij denote an indicator vector having exactly (i — 1) zeros for i = 1, Q. 

For a given i we have j = 1,..., (f^J different patterns with exactly (i — 1) 

zeros. Let J„ denote the subsample size for the i th level and the j th pattern, 
where Jij > 0. Let 0^ denote the subset of the S parameters 9i,...,9s which 
is estimable using only the observations having the missing pattern defined by 
Rij . Let &ij denote this sample estimate assuming that Jy > 0. Notice that the 
R's and corresponding estimates can be arranged into a hierarchical structure 
as i increases. 

Example. If Q = 3, then this hierarchical structure follows. 
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• the subsample which contains complete observations defines the first level 
or root level (i = 1) and corresponds to the indicator vector (1,1,1); 

• up to three subsamples define the second level (i = 2) and correspond to 
the three missing patterns (1, 1, 0), (1, 0, 1), and (0, 1, 1); 

• up to three subsamples define the third level (i = 3) and correspond to 
the three missing patterns (1,0,0), (0,1,0), and (0,0,1). 

We now use this hierarchy to improve the estimator 0y by using the in- 
formation about the unknown value of 0y from the next higher level. The 
improved estimator is 



e« k, ; (k; ; ) {Bi 3 -B l3 ). (1) 



The elements of the K matrices and B vectors in equation (1) are defined 
below. Assume there are S* = S(i,j) elements in 0y and without loss of gener- 
ality assume these are numbered 1, ...,S*. That is, assume, 0y = {fix, 6s*). 

Then By = fSyi, BijS* J and Hy = nByi,...,jBy,g*^. To define these vec- 
tors let Bijk represent the subvector of 0y with its k th component missing for 
k = 1, ...,S*. Two estimates of B^k are computed from the data. The first is 
based on the subsample defined by iiy; this is B^k which is the subvector of 0y 
with its k th component missing. The second is based on data collected at the 
(i + l) st level, i.e. Byfc. It is possible that there are no observations in the latter 
subsample in which case the corresponding subvector is dropped from both By 
and B^. The rectangular matrix Ky is a block matrix defined as follows: 

= Cov (0y,By fe/ 
The square block matrix K*- is defined as follows: 



Cov {B l ji,B l jkj + I[i =k ]Cov 



k,l=l,...,S* 

The estimator (1) defines the estimator with a variance-covariance matrix 

cov (e«, e«) = cov (e y , ©y) - Ky (k*,) - 1 k£ (2) 

defining the smallest dispersion ellipsoid in a class 



9£ = 9,, - A y [Bij -Bij) (3) 



with respect to different choices of the matrix Ay of proper dimensions. The 
estimators Qfj define a class of unbiased estimators of 0y . 

In practice the true values of Ky, K*^ and Cov ^0y,0yj usually are not 

available, in which case their consistent estimators Ky, K* 3 , and Cov ^0y , 0yJ 
are used instead. 

This substitution modifies (1) and (2) to the following equations 

(->„ (-),„ k„(k; ; ) : (k„, By) (4) 
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with 



Cm = Cm 9 y ) - Ky (k?-) k£. (5) 

In addition to dei (K?) > a new requirement comes from (4) and (5): K*- 
should be positive definite. From det (Kjy) > conclude that there exists a 

sufficiently large sample size N such that for any n > N have det (&tj) > 
with probability one. 

2.3. Assumptions 

In order to obtain the unbiased estimator defined by (1) with the smallest dis- 
persion ellipsoid defined by (2) we need (for every ij-subsample): 

• to know K*^ and it should be positive definite, 

• to know Kij (in many cases Kij consists of the elements of K*j ) , 

• E [Qij^ = and 

. /. («,..) Li,. 

When K*- and are not known their consistent estimators provide unbi- 
asedness and the smallest dispersion asymptotically. 

According to Little and Rubin (2002, p. 119) a missing-data mechanism 
is ignorable if (1) the missing data are missing at random and (2) parameters 
managing X and R are distinct that is in different parameter spaces. 

In case of ignorable missing data the missing data mechanism splitting the 
original sample into subsamples is independent from vector 0. 

Hence, the methodology proposed in Section 2.2 can be applied to ignorable 
missing data. 

2.4. Adjustment for non-ignorable missing data mechanism 

What does happen when missing-data mechanism is not ignorable? In this case 
it is reasonable to assume that some or all of S* components of the vector 

E (^Bij^j differ from these of E (Bij^ . In the other words the bias was brought 

by missing data. 

Suppose that missing data mechanism is managed not only by parameters 
which are distinct from but also by W parameters which are defined in a 
parameter space of 6^ . If W < S* , then it can be expected that there exist 
S* — W parameters independent from the missing data mechanism and they 
can be used as a components of the vectors Bij and Bij. 

Hence, the purpose is to find the parameters independent from the 
missing data mechanism. And use these in formulas (1),(2),(4) and (5). 
Example of such a case is considered in Section 4. 

In order to illustrate applicability of the methodology described in the section 
consider the following special case. 
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3. Random Dropout 



Right censored data is one of the most common problems statisticians face. This 
problem can be formulated in terms of missing data with monotone missing data 
structure. 

Suppose Xi,...,Xn are independent and identically distributed random 
variables with an unknown cumulative distribution function F(t), t e [0, oo). 
But Xi,,.., Xn are not observed directly since some are distorted by M 1: Mm 
generated by a random missing mechanism. The observed sample is Y\, , . . , Y^, 
where Y n = X n if M n > X n and Y n = M n otherwise, n € {1, N}. 

Assume the observed events occur at t\ < ... < is, where S < N. Consider 
an arbitrary event time t s . On the basis of complete (not censored at or before 
t s ) observations the empirical estimators F (t s ) and F (i s -i) can be calculated. 
In addition to these estimators an estimator F (i s _i) was obtained on the basis 
of the data independent from complete observations. At s — 2 the estimator 
F (ti) uses only the observations censored at £2- But at an arbitrary s th step 
F (t s _i) represents an estimator absorbing information from all previously cen- 
sored observations. We will not need to define its form explicitly because in a 
recursive approach considered below we use the estimator F° (t s -i) absorbing 
information from F (i s -i) and F (t s -i)- 

From we (1) obtain the following equation 



F°(t a )=F(t s )- 



Cov (P (t a ),F(t a -ifj 



Var (P (*,_i)) +Vor (^(t.-x) 
Considering the class of unbiased estimators 

F x (t s ^) = P(t,-i) -\[P (t,_i) - F (t,_! 

the estimator 



F (t s -i) — F (t s -i) . (6) 



F (i s _!) = F(t s _x) 



Var 



(F(t s -i) 



Var (F(t s -i)) +Var (f(* s _i) 
provides the smallest variance 

Var (P (t s ^Var (P (t s _i) 



Var F u (t s -i) 



Var (P(t s ^))+Var (F(t,_i) 
The estimator (7) can be rewritten as 

Var (P (t a _ 



F°{t a -!) = F(t._i) 



Var (P (t._i)) + Var (p 



F (t s -x) - F (t s -!) 

(7) 



(8) 
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From (8) we have 



Var (P (t a _i)) 



Var 



(Ht s -i)) 



Var (P (i s _i)) + Var (P 

Far (F (* s _i)) Far (> 



(9) 



(10) 



Far (F(f._i)J - Var (i^t.-i) 
It is interesting to see that from (10) we can write 

(W (f ^-!)))" 1 = (Var (f^)))^ + (^r (#(t._i 

which shows that Fisher information in F°(i s _i) is a sum of the Fisher infor- 
mation in F(t s -i) and in F(t s -i). 

Substituting (10) into (9) we obtain 



F(t B -i)=F°(t a -i) 



-Fits-!)- 



Var 



(F(t s -i) 



Var (Fit,-!)) -Var (F° (t._ x ) 
Var (P° %-!) 



Var (P it s -!)) -Var (f° (i s _0 
Applying the representation (11) of F{t s -i) to the equation F°(t s ) have 



(11) 



F°it s )=Fit s ) 



Cov 



(Fits), Fits-!) 



Var (Fits-!) 



F^s-!) -F° its-!) 



(12) 



Neither F(t s _i) nor its variance appear in (12) since the F°(i s _i) and its vari- 
ance absorb all information brought by F(i s _i) and its variance. 
Using the fact that 



CoviFit s ),Fit s -!)) 



Fit s -!)il - F^)) 



we have 



CoviFit s ),Fits-!)) I -F^) 



(13) 

VariFiU-!)) I -Fits-!) 

In (13) the cumulative distribution function F(-) is not known. Substituting 
its empirical estimator yields 



F it a ) = Fit s )- 1 F } ts \ 



Fit s -!)-F %-!) 
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= 1 - 



1-F(t s ) 

1 - Hta-l] 



1 - F (t s 



From (14) have 



;0 

F (t s ) 



l-F(t a ) 
1 - F(t a -i 



F (t s 



(14) 



(15) 



The estimator F (i s -i) on the right side of (15) was derived by applying 
F(-) instead of unknown F(-) (as it was done in (13)) on each of previous steps. 
Now using survival function S(-) instead of l-F(-) the equation (15) define the 
well-known Kaplan-Meier estimator (Kaplan and Meier, 1958). 



4. Bivariate Case 

Let Xi,...,Xjv be independent and identically distributed random variables 
from a bivariate distribution with a vector of means /i and a covariance matrix 

S, where = (x[ 1 \x t f^ l \ , fj, = (jxi, ^2), and X = ( G ^ a \ 2 ) is a positive 

^ ' V °~12 CT 22 / 

definite covariance matrix. 

Applying the hierarchical structure developed in Section 2 we summarize its 
content in the following table 



Level i 


Subsample j 




Jij 




Estimator 


1 


1 


(1,1) 


Ju 


(Ml,M2) T 


(-^m, X112) 


2 


1 


(1,0) 


J21 


Ml 


X211 


2 


2 


(0,1) 


J22 


M2 


X222 



The estimator of the vector (pi, /i 2 ) which uses all information in the sam- 
ple becomes 

(Ml, jj-2) T — (Am, X112) — Ao (Xlll — X-211, X\\2 — X222) (16) 

where 

*-^W«*)(" u( * fe) (17 ' 

In a case when covariances in (17) are known the estimator (16) will be unbi- 
ased with the smallest variance in class (3) . If these covariances are not known 
then their consistent estimates can be used instead and the obtained estimator 
will not be the optimal one anymore but it will converge to (16) in distribution 
(see proposition 2 in Appendix). 

An important special case is J22 = 0. We discuss this problem next. 
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4.1. Change Score Estimation 

Let 5 = /ii — (I2 be the change score we need to estimate. This difference can 
be estimated with complete observations: 5 = Xm — Xn 2 . 
The estimator (1) takes the following form 

S = S- -r^r- (l J) (Xm - X 211 ) (18) 
■Jii + 'Hi \ J 

with a variance 



M^fc-^*- ,^'^ )- (I9) 

If cr^ = o\ 2 then 5 = 5 (the estimator based on complete cases). 
If a\ 2 = then 5 = -j-^j—Xm + j+T^ X211 — X\\ 2 (the estimator based 
on available cases). 



4.2. Change Score Estimation at Compound Symmetry 

1 P 
P 1 



Let us assume £ = a 2 [ ^ f I then 



5 = 5- - J ]\ (1 - p) {Xm - X211) (20) 
J11 + J21 



with variance 



Var (6) = ^ (2(1- p)-—^-a 2 (l-pf). 



Jll \ Jll + J 21 

If p = then 5 = 5- 7r A fe (X m - X 211 ) and Far (j) = £ (2 - 
If p = 1 then 5 = 5 and Var (s) = 0. 



Now we return to the case where Jn > 0, J 2 \ > 0, and J 22 > but assume 
data are not missing at random. 

4.3. Non-ignorable Missing Data 

At non-ignorable missing data the parameters which do not change after 
missing data transformations should be found. Let us assume that the missing 
data case is the result of changed experimental conditions, for example, A shift 
appears for X« or X&> if one of these components is missing. The value of 
the A is unknown. 

Using only incomplete observations obtain 5 = X 2 u —X 222 . In 5 the A shift 
effect is canceled and E (^j = E (jfj = A. For these estimators Var (jfj = 

J11 — 2of 2 + fT 22)> V ar (yj = ^22 1|T ii + ^2i 1(J 22) an d the estimator (1) 
takes the following form 

2a 2 



5 A ° = 5 - 



—7 g "- Ar " +g " . . (5 - 5) (22) 

« 7 ? 1 (l + ^i)-2^ a + af a (l + ^) V 7 
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with a variance 



Var (s A °) = 11 - 12 + 22 ( 11 - 12 + 22 ' 

V ' Jn ^ii(^i(l + ^)-2^ 2 +- 2 2 2 (l + ^ 

(23) 

If the variances and covariances used in (22) and (23) are not available, then 
their consistent estimators can be used. According to Proposition 2 asymptotic 
properties continue to hold. 



5. Conclusion 

If only the variance-covariance structure of a considered model is known, the 
estimators proposed in this paper are unbiased and provide the smallest variance 
in a class of unbiased estimators. In the cases when one ought to estimate 
the parameters of variance covariance structure with consistent estimators the 
estimators obtain unbiasedness with the smallest variance asymptotically. 

These estimators are not restricted to monotone missing data structures and 
can be derived from the observations with a general pattern of missing data. 
Despite the fact that these estimators are obtained for the case of ignorable 
missing data they can also be derived for some cases of non- ignorable mechanism 
of missing data. A special case of nonignorable missing data considered in 
Section 5. 

This approach does not require the assumptions on parametrical families as 
many likelihood based methods and works when the first two moments of the 
underlying distribution are finite. 

Assuming asymptotical normality of the estimators obtained on subsamples 
the final estimators obtained with proposed methodology will be asymptotically 
normal as well. The two propositions in Appendix provide asymptotical mean 
and variance for these estimators. 

Many standard statistical procedures may be used with these estimators, for 
example, sample size determination or hypothesis testing. 

It was shown in section 3 that a well-known Kaplan-Meier estimator is a 
result of applying this approach to right censoring data with random dropout. 

Overall, the nonparametric ground, the absence of any imputations in any 
form, and the properties stated for finite and large sample sizes make the pro- 
posed estimator distinct from the others and applicable in many practical cases. 
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Appendix: Large Sample Properties 



If Kjj, K*j and Cov are known and there exists (K^) 1 , then the 

estimator (1) can be calculated and the asymptotic properties of the estimator 
(1) described by the following result. 

Proposition 1. Let us consider the vectors £y = y/Jij ~ @yj , "ipijs = 

y/J^ (fiijs - Bij^j , Qj s = yj 'J ijs \ Bij s - B ijs j , s = 1, S* (for simplicity we 
omit ij-subscript in further notation) with the following properties: 

1) £ — > £, in distribution, as J — ► +oo. Also assume E (£) = and all 
elements composing covariance matrix Cov (£,£) = are finite. 

2) Q s *> — ► Q s \ in distribution, as J s — > +oo. Also assume S(C (s) ) = and 

all elements composing covariance matrix Cov (C^ a \ C^) = ciS'^"* are finite, for 
all s = l,...,S*. 

3) ft"' — ► V j m distribution, as J — > +oo. Also assume 

and all elements composing covariance matrices Cov (i/jW = Cr S q' and 
Cov (£,i> {s) ) = are finite, for all s,q=l, S*. 

If det (K*) > and -^f= — >w s 6 [0, +oo), as J and/or J s go to +oo, then fj = 

\fj (& — 0^ converges to a random vector r\ with E (r)) = and Cow (77, 77) = 

CC'J.'J) = C(«)-C«'*) (C«'+f)) _1 (C(«^)) T , where matrices C«'0 and C^+O 
are combined from the other matrices C^'^ = ||Cg || s =i,...,S* and C^ + ^ = 
||^sg + i[ s=g ]7i; s C ss || s , g =i,...,s*. 

Proof. Taking into consideration that is an unbiased estimator of have 
E{rf) = yfj (EQ - e) = 0. Hence, E (77) = 0. 

From (2) have C^'") = J (Cov (§, g) - K (K*) _1 (K) T ) . 
Applying the facts 

(1) JCov ^0,0^ converges to C^'^, as J goes to +00, 

(2) K (K*)" 1 converges to C<^) (C^'+^y 1 , as ^ goes to w s , as J and/or 
J s go to +00, and 

(3) J(K) T goes to (C(«^)) T , as J goes to +00, 
conclude C®'® converges to C^. Q.E.D. 

In the expression for the term (C^+«) _1 (C^) T consists 

of quadratic forms and corresponds to the decrease of the original dispersion 
ellipsoid. Applying different quadratic forms (defined by risk function) to C^'^ 

the term C^'^ (C^ + ^) 1 (C^>^) defines different non-negative numbers 
showing asymptotic improvement of used risk function. 

In the Proposition 1 the cases when there exists w s — +00 were not con- 
sidered because as only w s = +00 information from s t/l -subsample on (i + 1) 
level is overwhelmed by information in zj-subsample and cannot improve the 
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asymptotic properties of the estimators derived from zj-subsample. In this case 
s* h -subsample on (i + 1) level should be excluded from consideration. 

Another extreme situation appears when w s is equal to which corresponds 
to incorporating information of exact knowledge. In the case B is known with 
zero variance. 

Proposition 1 defines the asymptotic properties of the estimator (1) but this 
estimator cannot be used in a number of practical cases because K (K*)~ usu- 
ally is not known. In this case the estimator 0, obtained in (4) by substitution 

-i £ 



K(K*) on K (^K*J , should be used. The asymptotic properties of is 
described as follows. 

Proposition 2. Suppose the assumptions of Proposition 1 hold and ev- 
ery element of J (fhv (©, <=>) - Cov (<=>, e)) , J (k - k) , and J (k* - K*) 
converges to some random variable with mean zero and finite variance. 

Then s/J (§ - e) converges in distribution to 77, as J — ► +00, where 77 

defined in Proposition 1. 
Proof. 

Notice that \/~J ^0 — Qj differs from y/J ^6 — 9^ only by applying K and 
K* instead of K and K*. 

From the fact that linear combinations of elements of K and ( K* ) are 



continuous functions and all these elements converge in probability to their true 
values on the basis of Theorem 5.5.4 (Casella and Berger, 2002, p. 233) conclude 

K ^K*^ converges in probability to K (K*)" 1 . 

Now from Slutsky's Theorem (Casella and Berger, 2002, p. 239) conclude 
converges in distribution to rj. Q.E.D. 



Remark: For the cases when all w s = estimator (4) becomes the same as 
the estimator derived by method of correlated processes (Pugachev,1973) and 
has the same asymptotical properties as the empirical likelihood estimator in 
the presence of auxiliary information (Zhang, 1996). 
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